|
The Buckwalter Arabic transliteration was developed at Xerox by Tim Buckwalter in the 1990s. It is an ASCII only transliteration scheme, representing Arabic orthography strictly one-to-one, unlike the more common romanization schemes that add morphological information not expressed in Arabic script. Thus, for example, a will be transliterated as ''w'' regardless of whether it is realized as a vowel or a consonant . Only when the ' is modified by a ' () does the transliteration change to ''&''. The unmodified letters are straightforward to read (except for '' *''=dhaal and ''E''=ayin, ''v''=thaa), but the transliterations of letters with diacritics and the harakat take some time to get used to, for example the nunated ' appear as ''N, F, K'', and the ' ("no vowel") as ''o''. ' is ''p''. Since the original Buckwalter scheme was developed, several other variants have emerged, although they are not all standardized. Buckwalter transliteration is not compatible with XML, so "XML safe" versions often modify the following characters: < > & (أ إ and ؤ respectively; Buckwalter suggests transliterating them as I O W, respectively). Completely "safe" transliteration schemes replace all non-alphanumeric characters (such as $'; *) with alphanumeric characters. For a complete description of different Buckwalter schemes as well as a more detailed discussion of the trade-offs between different schemes, see.〔Habash, Nizar. (''Introduction to Arabic Natural Language Processing'' ). Morgan & Claypool, 2010.〕 When transliterating Arabic text, several other issues may arise. First, some Arabic characters are not specified in the transliteration table, including non-alphabetic characters such as ۞ and , punctuation such as ؛ ؟, and "Hindi" or "Eastern Arabic" numerals. Similarly, sometimes Arabic sentences will borrow non-Arabic letters from Persian, some of which are defined in the full Buckwalter table.〔Buckwalter, Tim. (''Buckwalter Arabic Transliteration Table'' ).〕 Symbols that are not defined in the transliteration table may be deleted, kept as non-Latin symbols embedded in transliterated text, or transliterated into different (non-conflicting) Latin symbols. (For instance, it is straightforward to convert from Hindi numerals to Arabic numerals.) Another issue that arises is how to handle transliterating Arabic text with embedded ASCII text; for instance, an Arabic sentence that refers to "IBM" or an Arabic sentence that includes a quote in English. If the Latin text is not explicitly marked, it is a challenge to distinguish transliterated Arabic from Latin. If transliterated text with embedded Latin is later transliterated back to Arabic, the Latin text will be transliterated into garbage Arabic. Finally, another important decision to make is how much normalization of the Arabic text should be done during transliteration. This may include removing ـ kashida, removing short vowels and/or other diacritics, and/or normalizing spelling.〔 ==Buckwalter transliteration table== :hamza : *lone hamza: ' : *hamza on alif: > : *hamza below alif: < : *hamza on wa: & : *hamza on ya: } :alif : *madda on alif: | : *alif al-wasla: { : *dagger alif: ` : *alif maqsura: Y :harakat : *fatha: a : *damma: u : *kasra: i : *fathatayn: F : *dammatayn: N : *kasratayn K : *shadda: ~ : *sukun: o :ta marbouta: p :tatwil: _ 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Buckwalter transliteration」の詳細全文を読む スポンサード リンク
|